Scalable Classification over SQL Databases
نویسندگان
چکیده
We identify data-intensive operations that are common to classifiers and develop a middleware that decomposes and schedules these operations efficiently using a backend SQL database. Our approach has the added advantage of not requiring any specialized physical data organization. We demonstrate the scalability characteristics of our enhanced client with experiments on Microsoft SQL Server 7.0 by varying data size, number of attributes and characteristics of decision trees.
منابع مشابه
Scalable Mining for Classification Rules in Relational Databases
Data mining is a process of discovering useful patterns (knowledge) hidden in extremely large datasets. Classification is a fundamental data mining function, and some other functions can be reduced to it. In this paper we propose a novel classification algorithm (classifier) called MIND (MINing in Databases). MIND can be phrased in such a way that its implementation is very easy using the exten...
متن کاملScalable Numerical SPARQL Queries over Relational Databases
We present an approach for scalable processing of SPARQL queries to RDF views of numerical data stored in relational databases (RDBs). Such queries include numerical expressions, inequalities, comparisons, etc. inside FILTERs. We call such FILTERs numerical expressions and the queries numerical SPARQL queries. For scalable execution of numerical SPARQL queries over RDBs, numerical operators sho...
متن کاملConQuer: A System for Efficient Querying Over Inconsistent Databases
Although integrity constraints have long been used to maintain data consistency, there are situations in which they may not be enforced or satisfied. In this demo, we showcase ConQuer, a system for efficient and scalable answering of SQL queries on databases that may violate a set of constraints. ConQuer permits users to postulate a set of key constraints together with their queries. The system...
متن کاملDatabase Implementation of a Model-Free Classifier
Most methods proposed so far for classification of high-dimensional data are memory-based and obtain a model of the data classes through training before actually performing any classification. As a result, these methods are ineffective on (a) very large datasets stored in databases or datawarehouses, (b) datawhose partitioning into classes cannot be captured by global models and is sensitive to...
متن کاملNewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management
One of the key advances in resolving the “big-data” problem has been the emergence of an alternative database technology. Today, classic RDBMS are complemented by a rich set of alternative Data Management Systems (DMS) specially designed to handle the volume, variety, velocity and variability of Big Data collections; these DMS include NoSQL, NewSQL and Search-based systems. NewSQL is a class of...
متن کامل